\section{Our technique}
\label{sec-our-technique}

\subsection{Parsing clauses}
\label{sec-our-technique-parsing-clauses}

In order to parse \texttt{loop} clauses, we use a simplified version
of a parsing technique known as \emph{combinator parsing}
\cite{Wadler:1985:RFL:5280.5288}.  It is simplified in that we do not
need the full backtracking power of this technique.  It is fairly easy
to structure the individual clause and sub-clause parsers so that a
parser either succeeds or fails, simply because the \texttt{loop}
grammar is unambiguous at this level.  When a parser succeeds, the
result is correct and unambiguous, and when it fails, other parsers
are tried in sequence.  Obtaining the correct result requires the
individual parsers to be tried in a particular order, which is a
disadvantage of the simplification.  As discussed in
\refSec{sec-using-external-parser-framework}, we plan to avoid this
restriction by using an external parsing framework.

With this parsing technique, client code defines \emph{elementary
  parsers} that are then combined using combinators such as
\emph{alternative} and \emph{sequence}.  The resulting parser code is
\emph{modular} in that individual parsers do not have to be listed in
one single place.  For the \texttt{loop} clauses, this modularity
means that each type of clause can be defined in a different module.

In our parsing framework, an individual parser is an ordinary
\commonlisp{} function that takes a list of \commonlisp{} expressions
and that returns three values:

\begin{enumerate}
\item A generalized Boolean%
  \footnote{The term \emph{generalized Boolean} is part of the
    \commonlisp{} \cite{ansi:common:lisp} terminology.  It means any
    value where \texttt{nil} stands for \emph{false} and any other
    value stands for \emph{true}.}
  indicating whether the parse succeeded.
\item The result of the parse.  If the parse does not succeed, then
  this value is unspecified.
\item A list of the tokens that remain after the parse.  If the
  parse does not succeed, then this list contains the original
  list of tokens passed as an argument.
\end{enumerate}

Consider the following example:

{\small\begin{verbatim}
       (define-parser arithmetic-up-1-parser
         (consecutive
          (lambda (var type-spec from to by)
            (make-instance 'for-as-arithmetic-up
              :order '(from to by)
              :var-spec var
              :type-spec type-spec
              :start-form from
              :end-form (cdr to)
              :by-form by
              :termination-test (car to)))
          'simple-var-parser
          'optional-type-spec-parser
          (alternative 'from-parser
                       'upfrom-parser)
          (alternative 'to-parser
                       'upto-parser
                       'below-parser)
          'by-parser))
\end{verbatim}}

\noindent
The macro \texttt{define-parser} defines a named parser.  This parser
consists of four consecutive parsers:

\begin{enumerate}
\item A parser that recognizes a simple variable.  The result of this
  parser is the variable.
\item A parser that recognizes an optional type specifier.  The
  result of this parser is the type specifier or \texttt{t} if the
  type specifier is absent.
\item A parser that recognizes one of the \texttt{loop} keywords
  \texttt{from} or \texttt{upfrom} followed by a form.  The result of
  the parser is the form.
\item A parser that recognizes one of the \texttt{loop} keywords
  \texttt{to}, \texttt{upto}, or \texttt{below} followed by a form.
  The result of this parser is a \texttt{cons}, where the \texttt{car}
  is either the symbol \texttt{$<$} or the symbol \texttt{$<=$}
  depending on which keyword was recognized, and the \texttt{cdr} is
  the form.
\end{enumerate}

The function defined by the \texttt{lambda} expression combines the
results of those four parsers into a single result for the
newly defined parser.  In this example, the result of the new parser
is an instance of the class \texttt{for-as-arithmetic-up}.

Initially, the \texttt{loop} body is parsed as a sequence of
individual \texttt{loop} clauses, without any consideration for the
order between those clauses.  A failure to parse during this phase
will manifest itself as an error relating to a particular clause,
whether it is in a valid position or not.  Furthermore, ignoring
restrictions on clause ordering allows us to check the syntax of each
clause.  If order had been taken into account, we would either have to
abandon the parsing phase when a syntactically correct clause were
found in the wrong position and thereby being unable to verify
subsequent clauses, or else we would have to implement some
sophisticated error recovery, allowing the parsing process to continue
after a failure.

Our technique for parsing clauses does not work very well for
signaling useful errors when a clause fails to parse.  In
\refSec{sec-second-clause-parser}, we discuss our plans for improving
the situation.

\subsection{Representing parsed clauses}
\label{sec-our-technique-representing-parsed-clauses}

The result of the initial parsing process is a list of clauses, where
each clause has been turned into an instance of (a subclass of) the
class \texttt{clause}.

The classes representing different clauses are organized into a graph
that mostly mirrors the names and descriptions of different
clause types defined by the \commonlisp{} standard.

So for example, the class named \texttt{main-clause} is
the root class of all clauses of that type mentioned in the standard.
The same is true for \texttt{variable-clause}, \texttt{name-clause},
etc. \seeapp{loop-syntax}

Classes representing clauses that admit the \texttt{loop} keyword
\texttt{and} also have a list of sub-clauses.

This organization allows us to capture commonalities between different
clause types by defining methods on generic functions that are
specialized to the appropriate class in this graph.

In addition to representing each clause as an instance of the
\code{clause} class, we also represent the \texttt{loop} body itself
as an instance of the class named \texttt{loop-body}.  This instance
contains a list of all the clauses, but also other information, in
particular about default accumulation for this call to the
\texttt{loop} macro.

\subsection{Semantic analysis}
\label{sec-our-technique-semantic-analysis}

We use generic functions to analyze the contents of the parsed
clauses, and to generate code from them.  The reason for using generic
functions is again one of modularity.  A method specialized to a
particular clause type, represented by a particular standard class,
can be textually close to other code related that clause type.

Checking the validity of the order between clauses is done in the
first step of the \emph{semantic analysis}, allowing us to signal
pertinent error conditions if the restrictions concerning the order of
clauses are not respected.

Next, we verify that the variables introduced by a clause are unique
when it would not make sense to have multiple occurrences of the same
variable.  We also verify that there is at most one \emph{default
accumulation category}, i.e, one of the categories \emph{list},
\emph{min/max}, and \emph{count/sum}.

\subsection{Code generation}
\label{sec-our-technique-code generation}

Our code generation consists of a direct expansion to lower-level
\commonlisp{} code.  We do not make any attempts to detect problems
such as unused variables, type conflicts, etc.  All these problems
will be detected by the compiler when it processes the expanded code.

The main control structure for code generation consists of two steps:

\begin{itemize}
\item First, the \texttt{loop} prologue, the \texttt{loop} body, and
  the \texttt{loop} epilogue are constructed in the form of a
  \texttt{tagbody}%
  \footnote{For readers unfamiliar with \commonlisp{}, the
    \texttt{tagbody} special form allows low-level constructs such as
    arbitrary control transfers to arbitrary statements through the
    use of labels and \texttt{go} forms that jump to such labels.
    This special form is mostly used in the expansion of high-level
    macros such as \texttt{loop} and other iteration constructs.}
  special form.
\item To the resulting \texttt{tagbody} form is then applied a set of
  nested \emph{wrappers}, one for each clause.  A wrapper for a clause
  typically contains \texttt{let} \emph{bindings} required for the
  clause, but also iterator forms where such iterators are required by
  the clause type, for example \texttt{with-package-iterator}.
\end{itemize}

The \texttt{loop} body consists of three consecutive parts:

\begin{enumerate}
\item The \emph{main} body, containing code for the \texttt{do} clauses
  and the accumulation clauses.
\item The \emph{termination-test} part, containing code that checks
  whether iteration should terminate.
\item The \emph{stepping} part, containing code that updates iteration
  variables in preparation for the next iteration.
\end{enumerate}

\noindent
For a small example of expanded code, consider the following
\texttt{loop} form:

{\small\begin{verbatim}
       (loop for i from 2 to 20 
             when (> i 10) do (print i))
\end{verbatim}}

\noindent
It expands to%
\footnote{We cleaned it up somewhat by removing unnecessary
  \texttt{progn} forms and we inserted the comments manually.}
the following code:

{\small\begin{verbatim}
       (macrolet ((loop-finish ()
                    '(go #:g956)))
         (block nil
           (let ((#:g957 2) (#:g958 20) (#:g959 1))
             (let ((#:g960 #:g957) (i #:g957))
               (tagbody
                  (if (<= #:g960 #:g958)
                      (incf #:g960 #:g959)
                      (go #:g956))
                #:g963
                  ;; main body
                  (let ((#:g964 (> i 10)))
                    (if #:g964
                        (print i)
                        (progn)))
                  ;; termination test
                  (unless (<= #:g960 #:g958)
                    (go #:g956))
                  ;; stepping
                  (progn (setq i #:g960) 
                         (incf #:g960 #:g959))
                  (go #:g963)
                #:g956
                  (return-from nil nil))))))
\end{verbatim}}

The essence of code generation is handled by a number of generic
functions, each extracting different information from a clause:

\begin{itemize}
\item \texttt{accumulation-variables} extracts the accumulation
  variables of a clause, indicating also whether the \texttt{loop}
  keyword \texttt{into} is present.
\item \texttt{declarations} extracts any declarations that result from
  the clause.
\item \texttt{prologue-form} returns a form that should go in the
  \texttt{loop} prologue, or \texttt{nil} if no prologue form is
  required for the clause.
\item \texttt{epilogue-form} returns a form that should go in the
  \texttt{loop} epilogue, or \texttt{nil} if no epilogue form is
  required for the clause.
\item \texttt{termination-form} returns a form that should become a
  termination test, or \texttt{nil} if the clause does not result in a
  termination test.
\item \texttt{step-form} returns a form that should be included in the
  stepping part of the \texttt{loop} body, for those clause types that
  define stepping.  This generic function returns \texttt{nil} if the
  clause does not have any step forms associated with it.
\item \texttt{body-form} returns a form that should be present in the
  main body of the expansion, or \texttt{nil} if the clause does not
  result in any form for the body.
\end{itemize}

The generic function \texttt{prologue-form} takes a clause argument
and returns a form that should go in the \texttt{loop} prologue.  The
\texttt{initially} clause is an obvious candidate for such code.  But
the stepping clauses also have code that goes in the prologue, namely
an initial termination test to determine whether any iterations at all
should be executed.

Of the clause types defined by the \commonlisp{} standard, only the
\texttt{finally} clause has a method that returns a value other than
\texttt{nil} on the generic function \texttt{epilogue-form}.

The generic function \texttt{termination-form} takes a clause argument
and returns a form for that clause that should go in the
termination-test part of the body of the expanded code.  Some of the
\texttt{for/as} clauses and also the \texttt{repeat} clause have
specialized methods on this generic function.

The generic function \texttt{step-form} takes a clause argument and
returns a form for that clause that should go in the stepping part of
the body of the expanded code.  The \texttt{for/as} clauses and also
the \texttt{repeat} clause have specialized methods on this generic
function.

The generic function \texttt{body-form} takes a clause argument and
returns a form for that clause that should go in the main body of the
expanded code.  The \texttt{do} and the accumulation clauses have
specialized methods on this generic function.

\subsection{Tests}
\label{sec-our-technique-tests}

Our code has been thoroughly tested.  The code for testing contains
almost $5000$ lines.  This code has been taken from Paul Dietz'
ANSI test suite%
\footnote{See: https://gitlab.common-lisp.net/groups/ansi-test}
and adapted to our needs.  In particular, we had to remove some tests
that did not conform to the standard, and we added tests where the
test suite omitted to test potentially non-conforming behavior.
